A Model for k-Nearest Neighbor Query Cost in Multidimensional Index Structures
نویسندگان
چکیده
The k-nearest neighbor query in multidimensional index structures is one of the most frequently used query types in multimedia databases and geographic information systems. Until now, most of the analytic models are restricted to a particular type of the index structure, for example, the R-Tree and they concentrate on the analysis of the range query. Recently, a cost model [3] was reported for nearest neighbor queries. However, the model considered only 1-nearest neighbor queries rather than k-nearest neighbor queries. In this paper, we present an analytic model for the cost of the k-nearest neighbor query in multidimensional index structures. As a basis of the model, we introduce the concept of the regional average volume and the varying density function. The advantages of our model are in particular as follows: It is applicable to any type of datasets with arbitrary distributions (uniform and non-uniform ones), works for the kas well as 1nearest neighbor query, and is a dynamic analysis method which enables a rapid analysis without requiring a time-consuming simulation of data. To estimate the accuracy of our model, we conducted a various range of experiments on the datasets with various distributions. The results show that our analytic model is accurate for the data sets with non-uniform distributions as well as uniform distributions in low and mid dimensions.
منابع مشابه
Non-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملA Model for k-Nearest Neighbor Query Processing Cost in Multidimensional Data Space
A cost model for the performance of the k-nearest neighbor query in multidimensional data space is presented. Two concepts, the regional average volume and the density function, are introduced to predict the performance for uniform and non-uniform data distributions. The experiment shows that the prediction based on this model is accurate within an acceptable range of the error in low and mid d...
متن کاملSoftware Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms
A successful software should be finalized with determined and predetermined cost and time. Software is a production which its approximate cost is expert workforce and professionals. The most important and approximate software cost estimation (SCE) is related to the trained workforce. Creative nature of software projects and its abstract nature make extremely cost and time of projects difficult ...
متن کاملAn efficient nearest neighbor search in high-dimensional data spaces
Similarity search in multimedia databases requires an efficient support of nearest neighbor search on a large set of high-dimensional points. A technique applied for similarity search in multimedia databases is to transform important properties of the multimedia objects into points of a high-dimensional feature space. The feature space is usually indexed using a multidimensional index structure...
متن کاملDrought Monitoring and Prediction using K-Nearest Neighbor Algorithm
Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), usin...
متن کامل